Overview
The EDL pipeline can be configured using three boolean flags at the top ofrun_full_pipeline.py. These settings control data fetching behavior, optional datasets, and cleanup operations.
Configuration Flags
All configuration flags are located inrun_full_pipeline.py at lines 61-71:
FETCH_OHLCV
Controls whether to fetch historical OHLCV (Open, High, Low, Close, Volume) data for all stocks.
True: Fetches lifetime OHLCV data using smart incremental updates- First run: ~30 minutes (downloads full history from 1976)
- Subsequent runs: ~2-5 minutes (only fetches new data)
- Enables ADR, RVOL, ATH, and % from ATH calculations
False: Skips OHLCV fetching entirely- Pipeline runs ~4 minutes faster
- Fields that depend on OHLCV will show
0ornull:5/14/20/30 Days MA ADR(%)RVOL% from ATHReturns since Earnings(%)
- Testing pipeline changes without needing price data
- Running quick fundamental-only refreshes
- Network bandwidth constraints
- Creates/updates:
ohlcv_data/{SYMBOL}.csv(one file per stock) - Creates/updates:
indices_ohlcv_data/directory for index data
FETCH_OPTIONAL
Enables fetching of standalone datasets not included in the main pipeline output.
True: Runs PHASE 6 scripts to fetch:- All market indices (
all_indices_list.json) - 194 indices - ETF data (
etf_data_response.json) - 361 ETFs
- All market indices (
False: Skips PHASE 6 entirely
| Script | Output File | Records | Description |
|---|---|---|---|
fetch_all_indices.py | all_indices_list.json | 194 | Nifty 50, Bank Nifty, sectoral indices |
fetch_etf_data.py | etf_data_response.json | 361 | All exchange-traded funds |
all_stocks_fundamental_analysis.json.gz. They’re used separately by the frontend for index tracking and ETF screening.
When to enable:
- You need fresh index composition data
- Building ETF comparison features
- Running a full data refresh for all asset classes
CLEANUP_INTERMEDIATE
Auto-deletes intermediate files after successful pipeline completion.
True: Removes all intermediate files and directories after compression- Keeps only:
*.json.gzfiles +ohlcv_data/+indices_ohlcv_data/ - Frees ~150-200 MB of disk space
- Keeps only:
False: Preserves all intermediate files for debugging
- Debugging pipeline failures
- Inspecting intermediate data quality
- Running custom analysis on raw outputs
- Developing new pipeline stages
Modifying Configuration
Common Configuration Scenarios
Quick Fundamental Refresh (No OHLCV)
Use case: Testing, quick fundamental updates
Full Production Refresh
Use case: Daily automated refresh, complete data update
Development/Debugging Mode
Use case: Inspecting intermediate outputs, debugging pipeline stages
Impact on Output Fields
WhenFETCH_OHLCV = False, the following fields in all_stocks_fundamental_analysis.json.gz will be 0 or null:
| Field | Default Value (No OHLCV) |
|---|---|
5 Days MA ADR(%) | 0 |
14 Days MA ADR(%) | 0 |
20 Days MA ADR(%) | 0 |
30 Days MA ADR(%) | 0 |
RVOL | 0 |
% from ATH | 0 |
Returns since Earnings(%) | 0 |
Max Returns since Earnings(%) | 0 |